Index

Symbols

α (alpha)
- Bonferroni, 153–154
- definition of, 41
- level, 206
- relation to sample sizes, 365–366
- setting, 43
* (asterisk), 155
β (beta), 41
λ (half‐life), 280–282
κ (kappa), 189–190
μg (micrograms), 280–282
Π (pi), 27–28
√ (radical sign), 21
Σ (sigma), 27–28
γ (skewness coefficient), 121

A

absolute values, 23
accuracy, 37, 38, 262–264
active group, 187
actuarial life tables, 307, 311–316, 320–321
addition, 18–19
additive, 296
administrative measurements, 63
adverse events, 70
agriculture, 1
Akaike’s Information Criterion (AIC), 259, 276–277, 342
alcohol intake, 94–98
alpha (α)
- Bonferroni, 153–154
- definition of, 41
- level, 206
- relation to sample sizes, 365–366
- setting, 43
alternative hypothesis, 40, 43, 144, 150–151, 324
Alzheimer’s disease, 91, 146
amputation, 292–293
analysis of variance (ANOVA)
- assessing, 152–157
- introduction to, 11, 47–49
- using, 143–145, 158
analytic dataset, 76
analytic research, 88–90
analytic study designs, 91
analytic suite, 57–58
analyzing data, 7, 9–10, 74–76
and rule, 31
animal research, 1
ANOVA (analysis of variance)
- assessing, 152–157
- introduction to, 11, 47–49
- using, 143–145, 158
anticipated enrollment rate, 347
antilogarithm, 22, 118–119
anti‐synergy, 245–247
area under the ROC curve (AUC), 265, 280
arguments, 23
arithmetic mean, 115–116
arrays, 25–27
asbestos, 296–298
asymptomatic confidence limits, 134
attrition, 366–367
average values, 11, 39–40, 141–158

B

background information, 69
backward elimination approach, 295
bad fit line, 215–216
balanced groups, 154
bar charts, 113, 126
base‐2 logarithms, 22
base‐10 logarithms, 22
base‐e logarithms, 22
baseline survival function, 342–344
baseline values, 96
beta (β), 41
bimodal (two‐peaked) distribution, 114, 117
binary logarithms, 22
binary variables, 173–174, 177, 236–238
binomial distribution, 36, 354–355
biology, 1
biopsy specimens, 188
biostatisticians, 60, 141, 161, 168–169
biostatistics, definition of, 1, 7
bivariate analysis, 11, 128, 145, 160
blinding, 66, 70, 171
blocked randomization, 67
blood pressure
- of study participants, 116–117
- variable name, 18
body mass index (BMI), 177
bolus, 280
Bonferroni adjustment, 153–156
box‐and‐whiskers charts, 127–128
Bradford Hill’s criteria of causality, 297–298

C

calculated values, 37
calculations, 8
cancer
- as a categorical variable, 236–238
- liver, 93–97
- lung, 296–298
- relation to weight, 11
- remission, 318, 325
- stages, 102
- survival data, 208, 301–306, 337, 343
candidate covariates, 291
cannabidiol (CBD), 159–166
car accidents, 274–278
case report forms (CRFs), 74
case reports, 91
case series, 91
case studies, 91
case‐control study, 90, 93–96
CAT (computerized tomography) scans, 188
categorical data, 112–114
categorical variables, 236–238
causal inference, 11–12, 87, 90–95, 247
CBD (cannabidiol), 159–166
censoring, 302–306, 329, 335
census, 33–34
center, 115
centiles, 120
central tendency, 115
central‐limit theorem (CLT), 134
charts and charting. See also graphing
- bar and pie, 113, 126
- box‐and‐whiskers, 127–128
- categorical data, 112–114
- correlation coefficients, 202–203
- hazard rates and survival probabilities, 312–315
- multiple regression, 243–254
- numerical data, 124–128
- Poisson regression, 273–276
- Receiver Operator Characteristics (ROC), 264–265
- residuals, 222–223
- scatter plots, 214, 219, 221, 238–240
- software for, 57–58
- s‐shaped data, 252–256
- student t test, 45–47
CHD (coronary heart disease), 92–93
chemotherapy, 337
chi‐square distribution, 165–166, 358–359
chi‐square test
- pros and cons, 167–169
- sample size, 171–172
- tables, 13
- using, 11, 161–167, 174
chronic obstructive pulmonary disease (COPD), 302
CI (confidence interval)
- calculating, 226
- definition of, 10, 39, 130–131
- using, 131–138, 183, 194–197
CIR (cumulative incidence rate), 178–179
CL (confidence level), 39
CL (confidence limits), 130, 134
classification tables, 262–264
ClinCalc, 172
clinical center variable, 338
clinical trials, 9–10, 61–74
cloud‐based storage, 60
CLT (central‐limit theorem), 134
cluster sampling, 84
Cochrane Collaboration, 297
code, 241
code‐based methods, 60
coding categories, 105–107
coefficient of determination, 227
coefficient of variation (CV), 120
Cohen’s Kappa, 189
cohort study, 90, 93–96
collecting data
- introduction to, 1–2, 9–10
- manually and digitally, 74, 101–110
collinearity, 246–247, 266
commercial software, 54–58
common logarithms, 22
comparing averages, 142
complete separation, 267–268
complicated formulas, 24
Comprehensive R Archive Network (CRAN), 58
computer science, 18
computerized tomography (CAT) scans, 188
concordance, 342
confidence interval (CI)
- calculating, 226
- definition of, 10, 39, 130–131
- using, 131–138, 183, 194–197
confidence level (CL), 39
confidence limits (CL), 130, 134
confidentiality, 71
confounding
- adjusting for, 145, 294–296
- criteria for, 292
- definition of, 66, 94
- residual, 68
constants, 17
contingency tables, 173–178, 184, 187–188
control group, 98, 154, 187
convenience sampling, 84–85
COPD (chronic obstructive pulmonary disease), 302
coronary heart disease (CHD), 92–93
correlation, 40, 201–202, 220
correlation coefficient
- analyzing, 203–207
- description of, 202–203
- example of, 10, 40
- straight‐line regression, 227–228
- table, 242
correlational studies, 91–93
COVID‐19, 183–184
Cox, David (biostatistician), 330
Cox proportional hazards regression, 330
Cox/Snell R‐square, 259
CRAN (Comprehensive R Archive Network), 58
CRFs (case report forms), 74
crossover design, 65
cross‐sectional studies, 87–96, 176
cross‐tabulated data
- analyzing, 160–167, 171–172
- example of, 112–113, 166
- introduction to, 11
- relation to logistic regression, 250
- tables, 173–174
cumulative incidence rate (CIR), 178–179
cumulative survival probability, 311–314
curved‐line relationships, 214
CV (coefficient of variation), 120

D

data. See also cross‐tabulated data
- analyzing and collecting, 1, 7, 9–10, 74–76, 101–110
- categorical, 112–114
- free‐text, 103
- interval and ordinal, 102
- ratio, 102, 107–108
- skewed and unskewed, 11, 114, 121, 353
- survival, 208, 301–306, 337, 343
- time, 108–110
data close‐out, 76
data dictionary, 110
data safety monitoring board (DSMB), 73, 76
data safety monitoring committee (DSMC), 73
data snapshot, 76
date data, 108–110
date of last contact, 303
DBP (diastolic blood pressure), 116–117
deciliter (dL), 280
decision theory, 10, 39–40
degrees of freedom (df)
- calculating, 147–148, 152–153
- for chi‐square tests, 166–167, 358–359
dementia, 91, 146, 187–188
denominator, 41–42
dependent variable, 208, 213, 235–245
descriptive research, 88–90
descriptive study designs, 91
desired power, 347
desired α level, 347
determinants, 191
deviation, 119, 258–259
df (degrees of freedom)
- calculating, 147–148
- for chi‐square test, 166–167, 358–359
- numerator and denominator, 152
diabetes, 135–136, 236–240. See also Type II diabetes
diagnostic procedures, 183–188
diastolic blood pressure (DBP), 116–117, 123–124
dichotomous variables, 173–174, 249, 269
difference, 147–148, 163
difference table, 163
disease, 191, 193–194
dispersion, 115, 119
distribution center, 115
distributions
- bimodal (two‐peaked), 114–117
- binomial, 36, 354–355
- chi‐square, 165–166, 358–359
- exponential, 356
- Fisher F, 152, 359–360
- frequency, 47–48
- leptokurtic and platykurtic, 122
- normal, 13, 36, 114, 353
- probability, 35–37
- sampling, 38
- statistical, 13
- student t, 357–358
- weibull, 330, 356–357
District of Columbia, 34–35
division, 20
dL (deciliter), 280
dose‐response relationship, 298
double‐blinding, 66, 97
double‐precision numbers, 107
drug description, 70
drug development research, 280–282
DSMB (data safety monitoring board), 73, 76
DSMC (data safety monitoring committee), 73
Dupont, William D. (biostatistician), 60

E

ECG (electrocardiogram), 188
ecologic fallacy, 93
ecologic studies, 91–93
effect modification, 296–297
effect size
- compared to power and sample size, 45–47
- definitions of, 362–364
- example of, 39
- of importance, 158, 206, 361
efficacy objectives, 62–63
electrocardiogram (ECG), 188
elements, 24, 27–28
elimination constant rate, 282
engineering, 18
enzyme levels, 125–128
Epi Info, 59
epidemiological research, 1, 9–12
epidemiology, 191, 291–298
Epidemiology for Dummies (Mitra), 92
equal variance, 143
equations, 15, 24–25, 35–36
error, 78
estimate value, 242
estimation theory, 10
event status variable, 337
evidence levels, 91
Excel (Microsoft)
- for data collection, 103, 105, 107–110
- functions of, 57
- for log‐rank tests, 319–320
- for randomization, 67
- for straight‐line regression, 217
- for survival regression, 343
exclusion, 74
exclusion criteria, 64
expected count, 164
expected survival, 329, 331, 343–346
experimental research, 61, 88–90, 97–98
experiments, 1, 9–10, 91
expert opinion, 91
explicit constants, 17
exploratory analysis, 294
exploratory efficacy objective, 63
exploratory objectives, 62–63
exponential distribution, 356
exponential increase, 277
exponentiating, 21
exposure, 83, 175, 178, 193, 292
extrapolation, 108

F

F ratio, 152
F statistic, 228. 242
F value (value of F statistic), 155
factorials, 22
failure times, 356–357
fasting glucose values, 25–26, 153
fat intake, 92–93
FDA (Food and Drug Administration), 71
first quartile, 222
Fisher, Ronald Aylmer (biostatistician), 196
Fisher Exact test, 11, 13, 169–172
Fisher F distribution, 152, 359–360
Fisher z transformation, 204–205
fleas, 7
floating point numbers, 107
Food and Drug Administration (FDA), 71
formulas
- building blocks of, 17
- creating, 11–12
- definition of, 15
- introduction to, 1–2, 7–8
- types of, 16, 24
forward stepwise approach, 294–295
fourfold tables, 11, 173–178, 184, 187–188
fractional numbers, 107
free software, 58–60
free‐text data, 103
frequency bar charts, 113
frequency distribution, 47–48
functions, 23

G

gamma‐ray radiation, 251–256, 260–263, 267, 337–338
Gaussian distribution, 353
generalized linear model (GLM), 272–278
genetics, 1
geometric mean (GM), 118–119
GLM (generalized linear model), 272–278
glucose values, 25–26, 153
gold standard test, 183–184
good fit line, 215–216
goodness of fit, 227–228, 258–259
G*Power
- description of, 59–60, 68
- using, 158, 198, 207, 324–325
graphing. See also charts and charting
- categorical data, 112–114
- correlation coefficients, 202–203
- hazard rates and survival probabilities, 312–315
- multiple regression, 243–245
- numerical data, 124–128
- Poisson regression, 273–276
- Receiver Operator Characteristics (ROC), 264–265
- residuals, 222–223
- software for, 57–58
- s‐shaped data, 252–256
- student t test, 45–47
GraphPad, 67
Greek letters, 17
GUI (guided user interface), 56, 58

H

h value, 344–346
half‐life (λ), 280–282
hazard rate
- definition of, 305
- from life tables, 311–315
- relation to survival rate, 333
hazard ratios (HR), 334–335, 340, 364
health insurance, 112–114
healthcare, 9–10
highway accidents, 274–278
Hill, Bradford (epidemiologist)
- Bradford Hills’ criteria of causality, 297–298
histogram, 34–35, 124–125
historical control, 142
H‐L test, 259
homogeneity of variances, 155
hormone concentration, 287–290
Hosmer‐Lemeshow Goodness of Fit test, 259
HR (hazard ratios), 334–335, 340
HTN (hypertension), 90, 94–98, 177–178
human health research, 88
human subjects protection certification, 73
hyperplane, 234
hypertension (HTN), 90, 94–98, 177–178
hypothesis, 40–47, 63, 94, 247. See also null hypothesis
hypothesis‐driven analysis, 294
hypothesized cause, 83, 175, 178, 193, 292

I

ICF (Informed Consent Form), 72–73
ICH (International Conference on Harmonization), 72
icons explained, 3
identification (ID) numbers, 104
identity line, 245
imputation, 75
incidence, 191–198
incidence rate, 192–198
inclusion criteria, 64
independent t test, 148
independent variable, 208–210, 213, 215, 291–294
indicator variables, 237–238
indices, 174
individual‐level data, 160
inferential statistics, 34, 77
inferring, 10
infinity, 33
influenza, 192
Informed Consent Form (ICF), 72–73
inner mean, 118
integers, 107
interaction, 296–297
interaction terms, 237, 329
intercept, 215, 221, 224–225, 272, 329
intercept row, 224
interim analysis, 76
International Conference on Harmonization (ICH), 72
International Review Board (IRB), 72
Internet sources. See also G*Power; Microsoft Excel
- ClinCalc, 14
- Comprehensive R Archive Network (CRAN), 58
- Epi Info, 59
- GraphPad, 67
- International Business Machines, 57
- National Health and Nutrition Examination Survey (NHANES), 82, 86, 93, 148–157
- National Institutes of Health (NIH), 72–73
- OpenStat and LazStats, 59
- Power and Sample Size Calculation (PS), 60, 171–172, 324–325
- SAS OnDemand for Academics (ODA), 55–56
- Statista, 34
- StatPages, 158, 190, 282
interpolation, 208
inter‐quartile range (IQR), 120
inter‐rater reliability, 188–190
interval data, 102
intervention‐related measurements, 63
interventions, 61–62
intra‐rater reliability, 188–190
IQR (inter‐quartile range), 120
IRB (International Review Board), 72
iterative models, 246–247

K

Kaplan‐Meier method, 313–316, 328, 330
Kaplan‐Meier (K‐M) survival estimate, 313, 317
kappa (κ), 189–190
kilograms (kg), 218–220, 225–226, 239
Kruskal, William (statistician), 141
Kruskal‐Wallis test, 47–49, 144, 157
kurtosis, 121–122

L

Last Observation Carried Forward (LOCF), 75
last‐seen date, 303
LazStats, 59
least‐squares line, 234
left skewed data, 121
leptokurtic distribution, 122
lethal dose, 262
levels of measurement, 102–103
LFU (lost to follow‐up), 303–304, 308–309
life sciences, 1
life‐table method, 307, 311–316, 320–321
likert scale, 102, 106
line of best fit, 215–216
linear combination, 329
linear function, 210–211
linear model, 272–273
linear regression, 210
link function, 273–274
liver cancer, 93–97
locally weighted scatterplot smoothing (LOWESS) curve‐fitting, 12, 286–290
LOCF (Last Observation Carried Forward), 75
logarithms, 21–22, 118–119
logistic regression
- basics of, 251–254
- definition of, 12, 210
- disadvantages of, 266–268
- evaluating, 257–265
- sample size for, 268–269
- using, 249–250, 255–257
log‐normal distribution, 36, 125, 353–354
log‐rank test, 317–325, 328–330
longitudinal research, 90
lost to follow‐up (LFU), 303–304, 308–309
LOWESS (locally weighted scatterplot smoothing) curve‐fitting, 12, 286–290
lung cancer, 296–298

M

Mann, Henry (professor), 141
Mann‐Whitney U test, 47–49, 143, 362
Mantel‐Cox test. See log‐rank test
Mantel‐Haenszel chi‐square test, 168
margin of error (ME), 134
marginal totals, 160
masking, 66, 70, 171
mathematical expressions, 15
mathematical operations, 18–25
matrix, 26
matrix algebra, 26
maximum value, 222
ME (margin of error), 134
mean
- arithmetic, 115–116
- compared to other values, 142–157, 362
- confidence limits, 134–135
mean square (mean Sq), 155
measurements, 63–64, 102–103
mechanical function, 279
median, 116–117, 123, 222
meta‐analyses, 97–98
metadata, 110
mice, 1, 318
micrograms (μg), 280–282
Microsoft Excel
- for data collection, 103, 105, 107–110
- functions of, 57
- for log‐rank tests, 319–320
- for randomization, 67
- for straight‐line regression, 217
- for survival regressions, 343
millimeters of mercury (mmHg), 218–226, 229, 239
minimum value, 222
missing data, 74–75
Mitra, Amal K. (author)
- Epidemiology for Dummies, 92
mmHg (millimeters of mercury), 218–226, 229, 239
mode, 117
model building, 246
model fit statistics, 242
models
- generalized linear (GLM), 272–278
- linear, 272–273
- null, 228, 242, 259
- parsimonious, 293
- predictive, 228–229
- regression, 68, 208–209
molecular biology, 7
multicollinearity, 246–247
multi‐dimensional arrays, 26–27
multilevel variable, 236
multiple regression
- basics of, 234–235
- introduction to, 26
- sample size for, 247–248
- special considerations, 245–247
- using, 236–245
multiple R‐squared, 242
multiplication, 18–20
multiplicative, 296
multiplicity, 75–76
multi‐site study, 104
multi‐stage sampling, 85–86
multivariable regression, 291
multivariate analysis, 128, 145, 291

N

Nagelkerke R‐square, 259
National Health and Nutrition Examination Survey (NHANES), 82, 86, 93, 148–157
National Institutes of Health (NIH), 72–73
natural logarithms, 22
negative predicted value (NPV), 187
negatively skewed data, 121
NHANES (National Health and Nutrition Examination Survey), 82, 86, 93, 148–157
NIH (National Institutes of Health), 72–73
nominal variables, 102
non‐code‐based methods, 60
nonlinear function, 211
nonlinear least‐squares regression, 12
nonlinear regression, 279–286
nonlinear trends, 277
nonparametric regression, 286–290
nonparametric tests, 48–49, 157
non‐proportional hazards, 323
non‐sampling error, 78
non‐steroidal anti‐inflammatory drugs (NSAIDS), 159–166
normal distribution, 13, 36, 114, 353
normal Q‐Q graph, 222–223
normal‐based confidence intervals, 134
normal‐based confidence limits, 134
normality assumption, 143
not rule, 31
notches, 128
NPV (negative predicted value), 187
NSAIDs (non‐steroidal anti‐inflammatory drugs), 159–166
nuisance variables, 145
null hypothesis
- definition of, 40
- evaluating, 42–43, 150, 152–153, 322, 351–352
- example of, 161–165, 319
null model, 228, 242, 259
numerator, 41–42
numerical data, 107–109, 114–123

O

obesity, 177–178, 182–183
observational research, 88–90
observed count, 164
observed versus predicted graph, 245
ODA (SAS OnDemand for Academics), 55–56
odds, 32–33, 181
odds ratio (OR), 94–96, 181–183, 266–267
Office for Human Research Protections (OHRP), 72
one‐dimensional arrays, 25
one‐group t test, 148
one‐sided confidence interval, 133
one‐way ANOVA, 144
open‐source software, 58–59
OpenStat, 59
OR (odds ratio), 94–96, 181–183, 266–267
or rule, 31
order of operation, 24
ordinal data, 102
ordinary multiple linear regression model. See multiple regression
ordinary regression, 210
outcome, 208
outcome‐related measurements, 64
outliers, 229
overall accuracy, 185

P

p value
- definition of, 41, 242
- determining, 165–167
- evaluating, 42, 144, 147–157, 226–227, 340
- from the H‐L test, 259
paired t test, 148
paired values, 363
parabolic relationship, 214, 252–253
parallel design, 65
parameters, 33–34, 77, 208, 279
parametric tests, 48–49
parsimonious models, 293
parsimony, 293
participant identification (ID), 240
participant study identifier, 104
participants. See also sample size; samples
- enrolling, 68
- protection for, 71–73
- selecting, 64–65, 236–237
PatSat, 106
PCR (polymerase chain reaction), 184
Pearson, Karl (biostatistician), 161
Pearson Correlation test, 47–49, 227
Pearson kurtosis index, 122
percentile, 120
perfect predictor problem, 267–268
perfect separation, 267–268
periodicity, 83
PH (proportional hazards regression), 330–331, 333–334
pharmacokinetic (PK) properties, 280–282
pi (Π), 27–28
pie charts, 113
pilot study, 230
placebo, 66–67, 171, 187–188
placebo effect, 66, 187–188
plain text format, 16, 24
platykurtic distribution, 122
Plummer, Walton D. (biostatistician), 60
pointy‐topped distribution, 114
poisson distribution, 36, 355
poisson regression
- definition of, 12, 210
- using, 271–278
polymerase chain reaction (PCR), 184
populations, 33–37, 175
positive predictive value (PPV), 187
positively skewed data, 121
post‐hoc tests, 143, 152–157
potential confounding variables, 64
Power and Sample Size Calculation (PS)
- for chi‐square and Fisher exact tests, 171–172
- definition of, 60
- for survival comparisons, 324–325
power calculations, 47, 171–172, 361, 365
powers, 20–21, 41, 44, 45–47, 206
PPV (positive predictive value), 187
precision, 37, 38, 147
predicted values, 242
predictive model, 228–229
predictive value negative, 187
predictive value positive, 187
predictors
- introduction to, 208, 233
- in iterative models, 246–247
- in logistic models, 255–256
- in regression models, 273–274, 279
- relation to the outcome, 242, 245–246, 250
- types of, 209, 235–236
pregnancy, 171, 185–187
prevalence, 179, 186, 191–194
prevalence ratio, 179
primary diagnosis (PrimaryDx), 236–238
primary efficacy objective, 62
primary objectives, 62
primary sampling units (PSU), 86
privacy, 71
probability, 30–33
probability bell curve, 353
probability distributions, 35–37
probability of independence, 166
procedural descriptions, 70
product, of an array, 27
prognosis curves, 329, 331, 343–346
proportional hazards (PH) regression, 330–331, 333–334
proportions, 11, 135–136, 363
protective factor, 178
protractor, 113
PS (Power and Sample Size Calculation)
- for chi‐square and Fisher exact tests, 171–172
- definition of, 60
- for survival comparisons, 324–325
pseudo‐r‐squared values, 259
PSU (primary sampling units), 86
Python, 58

R

R (software)
- description of, 58
- nonlinear regression, 282–286
- odds ratio calculation, 183
- risk ratio calculation, 180–181
- straight‐line regression, 221
r value, 203–206
radiation exposure, 251–256, 260–263, 267, 337–338
radical sign (√), 21
random number generator (RNG), 80–81
random shuffling, 67
random variability, 158
randomization, 97, 171
randomized controlled trials (RCTs), 65–67, 98
randomness, 33
range, 120
ranks, 49
rate ratio (RR), 195–196
ratio data, 102, 107–108
rationale, 69
RCTs (randomized controlled trials), 65–67, 98
Receiver Operator Characteristics (ROC), 257, 264–265
reference level, 237
regression
- logistic
  - basics of, 251–254
  - definition of, 12, 210
  - disadvantages of, 266–268
  - evaluating, 257–265
  - sample size for, 268–269
  - using, 249–250, 255–257
- multiple
  - basics of, 234–235
  - introduction to, 26
  - sample size for, 247–248
  - special considerations, 245–247
  - using, 236–245
- multivariable, 291
- ordinary, 210
- straight‐line
  - basics of, 215–216
  - disadvantages of, 229–231
  - evaluating, 220–224
  - using, 216–220
  - when to use, 213–215
- survival
  - concepts of, 329–335
  - definition of, 210
  - evaluating, 337–343
  - sample size for, 346–347
  - using, 335–336
  - when to use, 328–329, 343
- univariate, 209
regression analysis, 12, 207–208
regression models, 68, 208–209
relative frequency, 30
relative risk, 95, 178–181
REM (Roentgen Equivalent Man), 251–254, 260–262, 267
research
- analytic, descriptive and observational, 88–90
- animal, 1
- epidemiological, 1, 9–12
- experimental, 61, 88–90, 97–98
- human health, 88
- longitudinal, 90
research studies, 1
residual information, 242
residual standard error, 222, 242
residuals, 222–224, 242–245
residuals versus fitted graph, 222–223
retinopathy, 292–293
right skewed data, 121
risk ratio, 96, 178–181
RMS (root‐mean‐square), 222
RNG (random number generator), 80–81
ROC (Receiver Operator Characteristics), 257, 264–265
Roentgen Equivalent Man (REM), 251–254, 260–262, 267
Roman letters, 17
root‐mean‐square (RMS), 222
roots, 21
Rothman’s causal pie, 297–298
round‐off error, 352
RR (rate ratio), 195–196
RStudio, 58
Rumsey, Deborah J. (author)
- Statistics For Dummies, 2, 29
- Statistics II for Dummies, 29

S

safety considerations, 70
safety objectives, 62–63
safety study, 62
sample size
- for chi‐square and Fisher exact tests, 171–172
- for cohort studies, 95–96
- compared to power and effect size, 44–47
- for comparing averages, 158
- for correlation tests, 206–207
- estimating, 198, 361–367
- introduction to, 14
- for logistic regression, 268–269
- for multiple regression, 247–248
- relation to confidence intervals, 130–131
- for straight‐line regression, 230–231
- for survival comparisons, 324–325
- for survival regression, 346–347
sample statistic, 175
samples. See also sample size
- framing, 78–79
- introduction to, 9–10, 14
- selecting, 33–37, 68, 78–80
- types of, 80–86
sampling clusters, 83–84
sampling distribution, 38
sampling error, 34, 78
sampling frame, 78
sampling strategies, 175–176
SAP (Statistical Analysis Plan), 70
SAS (Statistical Analysis System), 54–56, 58, 110
SAS OnDemand for Academics (ODA), 55–56
SBP (systolic blood pressure)
- comparing, 39
- effect of drugs on, 62, 123–125
- relation between weight and, 218–229
- variable name of, 18
scatter plots
- creating, 219, 238–240
- example of, 221
- types of, 214
Scheffe’s test, 154–156
scientific notation, 28
screening tests, 184–187
SD (standard deviation), 119, 123, 143, 222
SE (standard error)
- calculating, 147–148, 179–180
- of coefficients, 225–226, 340
- compared to confidence intervals, 130–131
- description of, 38, 129, 163, 242
- in a fraction, 41
secondary efficacy objective, 62
secondary objectives, 62
sensitivity, 185–186, 262–264
sigma (Σ), 27–28
significance, 40, 42–43, 174
significance tests. See statistical tests; specific test names
significant association, 11
significant correlation, 363–364
simple formulas, 24
simple random samples (SRS), 80–81
simple randomization, 66
simple regression, 209
simulation, 79
single‐blinding, 66
single‐precision numbers, 107
single‐site study, 104
skewed data, 11, 114, 121, 353
skewness, 121
skewness coefficient (γ), 121
slope row, 224
slopes, 215–217, 221, 224–231
smoking, 296–298, 334–335
smoothing fraction, 289–290
Social Science Statistics, 170
software
- for data collection, 105–110, 241
- evolution of, 54
- introduction to, 8
- for logistic regression, 256–257
- for power calculations, 47
- for straight‐line regression, 217
- types of, 54–60
- variables in, 18
Spearman Rank Correlation test, 47–49
specificity, 185–186, 262–264
spot‐checking, 109
SPSS (Statistical Package for the Social Sciences), 54–58
SQL (Structured Query Language), 110
square root, 332
square root law, 131
squaring, 332
SRS (simple random samples), 80–81
s‐shaped relationship, 214, 252–256
SSQ (sum of squares), 155, 215–216, 234
standard deviation (SD), 119, 123, 143, 222
standard error (SE)
- calculating, 147–148, 179–180
- of coefficients, 225–226, 340
- compared to confidence intervals, 130–131
- description of, 38, 129, 163, 242
- in a fraction, 41
statistic, definition of, 40
Statistical Analysis Plan (SAP), 70
Statistical Analysis System (SAS), 54–56, 58, 110
statistical distributions, 13
statistical estimation theory, 10, 37–39
statistical inference, 10, 37
Statistical Package for the Social Science (SPSS), 54–58
statistical tests, 8, 40–41, 44, 47–49. See also tests
statistically rare, 93
Statistics For Dummies (Rumsey), 2, 29
Statistics II For Dummies (Rumsey), 29
StatPages, 158, 190, 282
stepped line charts, 312–313
stepwise selection, 195
storage modes, 107
straight‐line regression
- basics of, 215–216
- disadvantages of, 229–231
- evaluating, 220–224
- using, 216–220
- when to use, 213–215
stratified samples, 81–82
strong linear relationship, 214
Structured Query Language (SQL), 110
student t distribution, 357–358
student t test, 41–42, 47–49, 142–152, 362–363
student t value, 226
study design, 2, 7, 14, 88–90
study protocol, 68–71
study rationale, 69
study title, 69
subtraction, 18–19
sum, 27
sum of squares (SSQ), 155, 215–216, 234
summarizing data
- categorical, 112–114
- numerical, 114–123
- survival, 302–316
summary statistics, 111
surveillance, 89–90
survival analysis, 13, 68, 301
survival curve shapes, 329–330
survival data, 208, 301–306, 337, 343
survival rate, 305, 364
survival regression
- concepts of, 329–335
- definition of, 210
- evaluating, 337–343
- sample size for, 346–347
- using, 335–336
- when to use, 328–329, 343
survival time, 301–306, 317–325
symbolic constants, 17
symmetry, 115
synergy, 245–247
systematic error, 37
systematic sampling, 82–83
systemic reviews, 97–98
systolic blood pressure (SBP)
- comparing, 39
- effect of drugs on, 62, 123–125
- relation between weight and, 218–229
- variable name of, 18

T

t tests, 41–42, 47–49, 142–152, 362–363
t value, 242
Tableau, 57, 60
terminal elimination rate constant, 12
test statistic, 37, 40, 41–42, 147
tests
- chi‐square, 11, 13, 161–169, 171–172, 174
- Fisher exact, 11, 13, 169–172
- H‐L, 259
- log‐rank, 317–325, 328–330
- Mann‐Whitney U, 47–49, 143, 362
- nonparametric, 48–49, 157
- post‐hoc, 143, 152–157
- Scheffe’s, 154–156
- Spearman Rank Correlation, 47–49
- student t, 41–42, 47–49, 142–152, 362–363
- Tukey‐Kramer, 154–156
- unequal‐variance t, 143
- Wilcoxon Signed‐Ranks (WSR), 47–49, 142, 146
- Wilcoxon Sum‐of‐Ranks, 47–49, 143, 157, 362
theoretical function, 279
third quartile, 222
three‐way ANOVA, 144–145
tied values, 49
time data, 108–110
time‐to‐event variable, 337
treatment bias, 66
treatment periods, 65
treatments, 187–188
trees, 1
trend line, 276
trimmed mean. See inner mean
true value, 37
Tukey‐Kramer test, 154–156
Tukey’s HSD (“honestly” significant difference test), 154–156
two‐dimensional arrays, 26
two‐peaked (bimodal) distribution, 114, 117
type I error, 41, 42–44, 75–76, 152
Type II diabetes, 10–12, 192–197, 250, 292
type II error, 41–44
typeset format, 16, 24–25
typographic effects, 16

U

ultrasound, 188
unbalanced groups, 154
under‐coverage, 78
unequal‐variance t test, 143
uniform distribution, 352
United States
- airports, 34–35
- census, 84
- International Review Board, 72
- surveillance study, 86
univariate analysis, 128
univariate regression, 209
Universität Düsseldorf, 59
unskewed data, 121

V

value of F statistic (F value), 155
values. See also p value
- absolute, 23
- average, 11, 141–158
- calculated, 37
- estimate, predicted and t values, 242
- F value, 155
- h value, 345–346
- paired, 363
- positive predictive, 187
- pseudo‐r‐squared, 259
- r value, 203–206
- tied, 49
Vanderbilt University, 60
Vanderbilt University Medical Center, 324–325
variable names, 110
variable width, 128
variables
- binary, 173–174, 177, 236–238
- categorical, 236–238
- clinical center, 338
- dependent, 208, 213, 235–245
- dichotomous, 173–174, 249, 269
- event status, 337
- independent, 208–210, 213, 215, 291–294
- indicator, 237–238
- multilevel, 236
- nominal, 102
- nuisance, 145
- potential confounding, 64
variance, 119, 143, 155
variance table, 155
viruses, 1
Viya, 56, 60
volume of distribution (Vd), 280–282

W

Wallis, Wilson Allen (statistician), 141
washout intervals, 65
waves, 96, 176
weak linear relationship, 214
websites. See also G*Power; Microsoft Excel
- ClinCalc, 172
- Cochrane, 297
- Comprehensive R Archive Network (CRAN), 58
- Epi Info, 59
- Graphpad, 67
- International Business Machines, 57
- National Health and Nutrition Examination Survey (NHANES), 82, 86, 93, 148–157
- National Institutes of Health (NIH), 72–73
- OpenStat and LazStats, 59
- Power and Sample Size Calculation (PS), 60, 171–172, 324–325
- SAS OnDemand for Academics (ODA), 55–56, 57
- Social Science Statistics, 170
- Statista, 34
- StatPages, 158, 190, 282
weibull distribution, 330, 356–357
weight, 218–220, 225–226, 239–240, 246
Welch, Bernard Lewis (statistician), 141
Welch test, 143, 150–151
Whitney, Donald Ransom (statistician), 141
whole numbers, 107
Wilcoxon, Frank (statistician), 141
Wilcoxon Signed‐Ranks (WSR) test, 47–49, 142, 146
Wilcoxon Sum‐of‐Ranks test, 47–49, 143, 157, 362
withdrawal criteria, 64
World War II, 71, 264

X

X variable, 213–215, 225–228
x‐rays, 188

Y

Y variable, 213–216, 224–228
Yates, Frank (statistician), 168–169
Yates continuity correction, 168–169